Search CORE

8 research outputs found

Thermal-Aware Networked Many-Core Systems

Author: Vaddina Kameswar Rao
Publication venue: Turku Centre for Computer Science
Publication date: 23/05/2014
Field of study

Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors. This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast

UTUPub

Thermal modeling and analysis of advanced 3D stacked structures

Author: Latif Khalid
Liljeberg Pasi
Plosila Juha
Rahmani Amir-Mohammad
Vaddina Kameswar Rao
Publication venue: Published by Elsevier Ltd.
Publication date: 31/12/2012
Field of study

AbstractThe emerging three-dimensional integrated circuits (3D ICs) offer a promising solution to mitigate the barriers of interconnect scaling in modern systems. It also provides greater design flexibility by allowing heterogeneous integration. However, 3D technology exacerbates the on-chip thermal issues and increases packaging and cooling costs. In this work, a 3D thermal model of a stacked system is developed and thermal analysis is performed in order to analyze different workload conditions using finite element simulations. The steady-state heat transfer analysis on the 3D stacked structure has been performed in order to analyze the effect of variation of die power consumption, with and without hotspots, on temperature in different layers of the stack has been analyzed. We have also investigated the effect of the interaction of hotspots has on peak temperature

Elsevier - Publisher Connector

Evaluation of A Low-power Random Access Memory Generator

Author: Kameswar Rao Vaddina
Publication venue: Institutionen för systemteknik
Publication date: 01/01/2006
Field of study

In this work, an existing RAM generator is analysed and evaluated. Some of the aspects that were considered in the evaluation are the optimization of the basic SRAM cell, how the RAM generator can be ported to newer technologys, automating the simulation process and the creation of the workflow for the energy model. One of the main focus of this thesis work is to optimize the basic SRAM cell. The SRAM cell which is used in the RAM generator is not optimized for area nor power. A compact layout is suggested which saves a lot of area and power. The technology that is used to create the RAM generator is old and a suitable way to port it to newer technology has also been found. To create an energy model one has to simulate a lot of memories with a lot of data. This cannot be done in the traditional way of simulating circuits using the GUI. Hence an automation procedure has been suggested which can be made to work to create energy models by simulating the memories comprehensively. Finally, basic ground work has been initiated by creating a workflow for the creation of the energy model

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Experimental Workflow for Energy and Temperature Profiling on HPC Systems

Author: Lefèvre Laurent
Orgerie Anne-Cécile
Rao Vaddina Kameswar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2021
Field of study

International audienceDespite recent advances in improving the performance of high performance computing (HPC) and distributed systems, power dissipation and thermal cooling challenges persist, impacting their total cost of ownership. Making HPC systems more energy and thermal efficient will require understanding of individual power dissipation and temperature contributions of multiple hardware system components and their accompanying software. In this work, we present an experimental workflow for energy and temperature profiling on systems running parallel applications. It allows full and dynamic control over the execution of applications for the entire frequency range. Through its use, we show that the energy response to frequency scaling is highly dependent on the workload characteristics and it is convex in nature with an optimal frequency point. During the course of our experimentation, we encountered a non-intuitive finding, where we observed that the tested low-power processor is consuming more power on average than the standard processor

HAL-ENS-LYON

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Experimental Assessment and Biaffine Modeling of the Impact of Ambient Temperature on SoC Power Requirements

Author: Brandner Florian
Jouvelot Pierre
Memmi Gérard
Rao Vaddina Kameswar
Publication venue: HAL CCSD
Publication date: 29/06/2024
Field of study

International audienceBased on fundamental physics-based considerations, we introduce the Biaffine Temperature-Voltage power model (BiTV) for SoC systems, which takes the influence of dynamic voltage, frequency, andambient temperature conditions into account. Using an ARM-Cortexbased AM572x system operating in a temperature-controlled oven, we provide experimental evidence of the validity of the BiTV power model over a significant range of ambient temperatures (25 to 55 °C), voltages (0.98 to 1.23 V) and frequencies (100 to 1,500 MHz).These experiments and the BiTV model provide quantitative elements to assess the impact of ambient temperature on systems’ performance. Such insights could be of use to system designers and compiler writers, in particular when dealing with embedded systems operating in harsh conditions or under energy-critical constraints

HAL-MINES ParisTech

Experimental Energy Profiling of Energy-Critical Embedded Applications

Author: Brandner Florian
Jouvelot Pierre
Memmi Gérard
Rao Vaddina Kameswar
Publication venue: HAL CCSD
Publication date: 21/09/2017
Field of study

International audienceDespite recent advances that have greatly improved the performance of embedded systems, we still face many challenges with regard to energy consumption in energy-constrained embedded and communication platforms. Optimizing applications for energy consumption remains a challenge and thus is a compelling research direction, both on the practical and theoretical sides. This paper presents a new experimental bench for energy profiling of non-performance-critical embedded and mobile applications and reports preliminary results obtained on two embedded boards. The experiments are driven by an online energy monitoring mechanism using National Instruments' cDAQ and LabVIEW running on a host machine. The host monitors a target device, which runs a set of benchmarks. We describe the experience gained from using and modding two different target boards, namely an Nvidia Jetson TX1 and a TI AM572x evaluation module. In particular, we confirm, and thus further validate, the existence of the Energy/Frequency Convexity Rule for CPU-bound benchmarks. This rule states that there exists an optimal clock frequency that minimizes the CPU energy consumption for non-performance-critical applications. We also show that the gain of frequency scaling is highly dependent on workload characteristics. Any future energy-management approach should take these behavioral traits into consideration. I. INTRODUCTION Continuous CMOS technology scaling (Moore's law) increases the on-chip power density due to the higher transistor integration. As power density increases, many factors like power dissipation, leakage, data activity, and electro-migration contribute to higher on-chip temperatures. The increase in temperature leads to an increase in leakage power, thereby increasing the total energy dissipation and thus forming a part of a vicious circle significantly limitting system performance. The bulk of today's computing does not happen on desktops, laptops, servers, or data centers, but rather on embedded media devices like mobile phones [1]. The embedded computing applications running on those devices demand better energy efficiency and flexibility in operation, while delivering better performance per Watt. At the same time, they cannot compete with application-specific integrated circuits (ASIC) in terms of energy efficiency. Indeed, a well-designed ASIC can achieve an efficiency of 5 pJ/op in a 90-nm CMOS process, whereas a very efficient embedded processor would require about 250 pJ/op. That means the embedded processor may consume about 50 times more energy than a custom designed ASIC [1]. Today's system-on-a-chip (SoC) platforms have a lot of software acting in unison, trying to deliver a seamless use

HAL Descartes

HAL-MINES ParisTech

Hal-Diderot

Modeling the energy consumption of programs: thermal aspects and Energy/Frequency Convexity Rule

Author: Brandner Florian
de Vogeleer Karel
Jouvelot Pierre
Memmi Gérard
Rao Vaddina Kameswar
Publication venue: HAL CCSD
Publication date: 11/10/2017
Field of study

International audienceThis article summarizes our current studies aiming at a better understanding of the energy consumption of a microprocessor during the execution of an application through a combination of theoretical results and experimental validations, The analysis of the transient thermal behavior and energy gains (ranging from 20 to 40% in some cases) via the adaptation of the clock frequency are of obvious practical interest. A general Passive Cooling Rule (PCR) for an isothermal object subjected to radiation, convection and internal heat generation is proposed. This power-temperature model is observed on an Exynos 5410 processor. Several approximations to this cooling rule are formulated for practical use, particularly online. They are accompanied by general rules for assessing when passive cooling becomes non-negligible compared to active cooling in embedded systems. On another hand, a theoretical framework for the existence of an Energy/Frequency Convexity Rule (EFCR) of program consumption is established. It is validated both by the state of the art and by experimental measurements where the impacts of variation of multiple parameters are studied. Power requirement models are then explained for the Exynos 5410 integrating the clock frequency, temperature and number of active cores. The novelty of these models is that they take into account certain characteristics of the running programs and that they can be directly reused in any simulation for other processors of similar architecture

HAL-MINES ParisTech

PT-BAR: Prioritized Thermo-Buffer Based Adaptive Routing Protocol for Network-on-chip

Author: Arjun
Ascia
Ayse Kivilvim Coskun
BKahng
Chiu Ge-Ming
Christopher James Glass
Chunsheng Liu
Das
Emilio Martinez
Fan
Fu
Giuseppe Ascia
Gratz
Greg
Holsmark
Hsien-Kai Hsin
Jingcao Hu
Junhui Wang
Kameswar Rao Vaddina
Kumar Amit
Lankes
Li
Lionel Vogt
Liu Peng
Ma
Mohapatra
Nilsson
Sheng Xu
Shervin
Skadron Kevin
Wang Ling
Wei Hung
Xiang Dong
Xiang Dong
Yu-Wei Yang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Crossref